A Post by Michael B. Spring

(A list of all posts by M.B. Spring)

Bookmarks and Meaning(December 15, 2007)

Social bookmarking systems provide a new source of infomration about resources. In this post, I try to set out some conceptual views of social bookmarking as a mechanism for asking what might be derived from an analysis of social bookmarks. The delicious system works as follows:

A user posts a url
To save the URL, the user must describe it -- this could be defaulted to a title, but it may be more bookmarker centered than page author centered
The user may add user notes and tags
The user may decide not to share the bookmark, making it private

With this in mind, at the very least, a social bookmarking system would include a triple that consists of a URLID=normalized URL, a USERID, a DESCription, OPTTag(s), OPTNotes, and SHARE(default TRUE). A conceptual table such as this has the potential to provide the following information:

The number of URL's that have been recorded
The number of users of the system
The number of user-URL's that are marked private
The number of user-URL's that are shared
The number of URL's that are tagged
The number of user-URL's that have user notes

For users, we can determine the following information

The minimum, maximum, average, median number of total, shared, and private URLs/user
Various measures of the variance in the total, shared, and private URLs across users
The minimum, maximum, average, median number of tags/user
Various measures of the variance in the number of tags across users
The minimum, maximum, average, median number of descriptions/user
Various measures of the variance in the number of descriptions across users

For URLs, we can determine:

The minimum, maximum, average, median number of total, shared, and private users/URL
Various measures of the variance in the total, shared, and private URLs across URLs
The minimum, maximum, average, median number of tags/URL
Various measures of the variance in the number of tags across URLs
The minimum, maximum, average, median number of unique tags/URL
Various measures of the variance in the number of unique tags across URLs

Beyond these measures we can examine a number of issues

Looking at tags, ordered by frequency of occurrence:
- are there obvious groupings of types of tags(semantic, affective, personal)
- do the most frequently occurring tags tell us anything about the collection
- are there patterns in the cooccurence of tags -- that is, for some threshold of frequency of co-occurence across URL's, is there a clear relationship between the co-occuring terms that allows us to simplify or clarify the tagging. Does the same hold for low co-occurence terms -- i.e. can we say some things about the terms.
- Is it possible to develop a tag map that would work as follows: take the n most frequently occurring terms and set them around the circumference of a circle. Take any term that co-occurs with one of those terms more than x%(e.g. 90%) of the time and bundle it with the more frequently occurring term. (If this was one of the original n, add a new n to the circle.) Take terms that co-occur 50-90% of the time and place them on strings proportionally distant from the terms they co-occur with. If they co-occur with two or three terms on the circle, web them such that they are proportionally distant from all the terms. If they only occur with one term, fan them outside the circle proporionally distant from the term. What kind of term map does that provide -- how might it be improved?
When we look at tags by users,
- can we identify communities of interest? (common frequently occurring tags)
- can be identify expertise (high number of URLs with l evels of commonly used tags)

There are surely many more questions that we might try to answer and there are surely more formal ways of formulating what might be inferred. I will be returning to this entry in the coming months and trying to add more thoughts about this.